In this worked example, we’ll take a look at the Meuse river data set from the sp package1.
You of course need to download and install the package if you don’t have
it already. Then make the data set available using data(meuse).
library(sp)
data(meuse)This map data is just a simple data.frame, but since we will be working
with simple features, we need to convert this data set to such. First, however,
notice that if
you look at the x and y values here, it’s clear that they’re actually
not longitude and latitude measurements.
By looking at the documentation for meuse, we instead see
that they are in fact Rijksdriehoek (RDH) (Netherlands topographical) map
coordinates. To everything work out correctly later on, we need to make
sure that our simple features data understands this information.
To convert the data into a simple features object, we use the
st_as_sf() function, using the coords argument to specify which
variables represent our x and y coordinates, and then we use the crs
argument to denote which projection system they are measured in.
RDH coordinates have the EPSG code 28992. So we use that as input.
library(sf)
meuse_sf <-
meuse %>%
st_as_sf(coords = c("x", "y"), crs = 28992) %>%
st_transform(crs = 4326)Before we dig into this data set, let’s set up the ggplot2 theme for mapping.
This time, we use the theme_map() (instead of theme_void(), which we used
previously) from the ggthemes
package.
library(tidyverse)
library(ggthemes)
theme_set(theme_map(base_size = 11))Let’s start with a simple plot, looking at lead, creating a bubble plot by mapping the presence of lead to size of the points.
ggplot(meuse_sf) +
geom_sf(aes(size = lead)) +
theme(legend.position = "right")Figure 1: Concentration of lead around the river Meuse.
The meuse data set is so called because it’s related to pollution data
around the river Meuse, so it makes sense that we actually try to include
the location of this river in our plot too.
To do so, we need some map data that represents this river. Our trusty
rnaturalearth package yet again comes to the rescue, although this
time the data is actually not included directly in the package
but must instead be downloaded using the ne_download() function.
Let’s begin by downloading the data.
library(rnaturalearth)
rivers <- ne_download(
scale = 10,
type = "rivers_lake_centerlines",
category = "physical",
returnclass = "sf"
)rivers contains rivers and lakes from all across the world, which means
that if we were to plot it, we would get a lot of information we don’t need.
There are two ways around this, but either way we first need to figure out
a bounding box that includes the coordinates from the data set, perhaps
with a little extra margin. As I hope you’ve started to realize by now,
there often is a st_*() function for whatever mapping related issue
you can think of. This is the case here to: st_bbox().
bbox <- st_bbox(meuse_sf)
bbox## xmin ymin xmax ymax
## 5.72319 50.95661 5.76304 50.99156
The first and dirtiest method for plotting the data is just to use the
coord_sf() function to provide limits using our bbox object.
There’s nothing wrong with using this method, and we’ll stick to it here,
but we note that a more elegant way might be to use st_crop(), which
works directly with our bounding box above.
# Alternative solution: river_meuse <- st_crop(rivers, bbox)
ggplot(meuse_sf) +
geom_sf(aes(size = lead)) +
geom_sf(data = rivers) +
coord_sf(
xlim = c(bbox[["xmin"]], bbox[["xmax"]]),
ylim = c(bbox[["ymin"]], bbox[["ymax"]])
) +
theme(legend.position = "right")Regrettably, this is not quite what we would like to see. The resolution of this vector raster is too low and the river ends up being just a line. Let’s try something different.
What we’ll do instead is to plot a raster map of the river and the surrounding area. We’ll download the map from stamen maps, using ggmap.
library(ggmap)To download a map, we need to call get_stamenmap() and specify a bounding
box. Fortunately for us, we already computed the bounding box of the points in
the last step, so let’s use that.
Note: Stamen Maps have been incorporated into Stadia maps and get_stamenmap() is currently returning brown warning tiles occasionally and will stop working completely after October 31. So you may not see the same result here if you follow along.
# extend the ranges slightly
xrng <- extendrange(c(bbox$xmin, bbox$xmax))
yrng <- extendrange(c(bbox$ymin, bbox$ymax))
meuse_raster <- get_stamenmap(
c(
left = xrng[1],
bottom = yrng[1],
right = xrng[2],
top = yrng[2]
),
zoom = 14,
maptype = "terrain"
)To then build the map, we use the ggmap() function instead of ggplot().
Now we run into a hurdle, however, because ggmap and sf unfortunately
does not play so well together2. As it stands, right now, it is
therefore better to use the standard map features when working with raster maps
from ggmap. To do so, we convert our meuse coordinates from
simple features back to a standard data frame.
meuse_df <- cbind(
st_drop_geometry(meuse_sf),
st_coordinates(meuse_sf)
)Now we can use the raster map together with our points, simply
overlaying them using the geom_point() function.
ggmap(meuse_raster) +
geom_point(aes(X, Y, size = lead), data = meuse_df) +
theme(legend.position = "right")Let’s make the plot slightly more interesting by including data on the other heavy metals too. One way to do so it to use facets, but for that to work, we need to have our data in a long format, where one column denotes the variable names and another notes their values.
meuse_df_long <-
meuse_df %>%
pivot_longer(
cadmium:zinc,
names_to = "metal",
values_to = "concentration"
)We construct our plot as before, faceting on the type of metal.
ggmap(meuse_raster) +
geom_point(
aes(X, Y, size = concentration),
alpha = 0.5,
data = meuse_df_long
) +
facet_wrap("metal") +
theme(legend.position = "right")Figure 2: Concentration of different heavy metals around the river Meuse in the Netherlands.
Purely for instructional reasons, let’s assume that we were considering some kind of cleanup of this heavy metal waste, but wanted to see how this would affect affect the ruins of the nearby Kasteel Stein by displaying it on the map.
We don’t know the coordinates of the castle off-hand, so let’s find them out by geocoding. In the lecture, we used the facilities of ggmap for geocoding, but unfortunately the only API that is currently supported in ggmap is Google’s API, which is not free.
Another alternative is to use tidygeocoder, which features a host of geocoding alternatives. A couple of these are free, for instance the Nominatim API.3
Start by generating a tibble (or data.frame) with the names
and addresses that you want to geocode.
library(tidygeocoder)
addresses <- tribble(
~name, ~address,
"Kasteel Stein", "Kasteel Stein, Netherlands"
)Next, pipe these addresses on to tidygeocoder::geocode().4
kasteel_stein <-
addresses %>%
tidygeocoder::geocode(
address, # map to address
method = "osm" # for Nominatim API
)
kasteel_stein## # A tibble: 1 × 4
## name address lat long
## <chr> <chr> <dbl> <dbl>
## 1 Kasteel Stein Kasteel Stein, Netherlands 51.0 5.76
To finish our map, we complement our previous plot with two layers for our new, geocoded data, to show a label and a dot for the castle ruins.
ggmap(meuse_raster) +
geom_point(
aes(X, Y, size = concentration),
alpha = 0.5,
data = meuse_df_long
) +
geom_point(
aes(long, lat),
col = "red",
data = kasteel_stein
) +
geom_label(
aes(long, lat, label = name),
vjust = -0.5,
data = kasteel_stein
) +
facet_wrap("metal") +
theme(legend.position = "right")Figure 3: Concentration of different heavy metals around the river Meuse in the Netherlands.
The source code for this document is available at https://github.com/stat-lu/dataviz/blob/main/worked-examples/heavy-metal.Rmd.
The sp package is a full-featured solution for visualization data, and can be used in its own right to create maps in R.↩︎
In this case, the problem would actually be hardly noticeable because the map is on a small scale and the problem, which occurs because the bounding box of the raster is in a different projection, results in no noticeable distorsion.↩︎
The API is free, but it is also rate-limited at one request per second and not as powerful as some of the other available APIs. If you have a project where you need more serious geocoding, you should probably get a key for one of the other APIs.↩︎
There is a
namespace clash here with ggmap, which has its own ggmap::geocode()
function, so we need to specify which function to use with :: here.↩︎